Overview

Dataset statistics

Number of variables12
Number of observations2773
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory260.1 KiB
Average record size in memory96.0 B

Variable types

NUM12

Warnings

qty_items is highly correlated with gross_revenueHigh correlation
gross_revenue is highly correlated with qty_itemsHigh correlation
avg_ticket is highly skewed (γ1 = 27.67812665) Skewed
frequency is highly skewed (γ1 = 46.07732187) Skewed
qtd_returns is highly skewed (γ1 = 21.6260127) Skewed
customer_id has unique values Unique
recency_days has 33 (1.2%) zeros Zeros
qtd_returns has 1481 (53.4%) zeros Zeros

Reproduction

Analysis started2022-09-21 15:59:10.029854
Analysis finished2022-09-21 15:59:35.430838
Duration25.4 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct2773
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15285.28128
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:35.544774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12626.6
Q113815
median15241
Q316780
95-th percentile17950.4
Maximum18287
Range5940
Interquartile range (IQR)2965

Descriptive statistics

Standard deviation1715.152588
Coefficient of variation (CV)0.1122094226
Kurtosis-1.207029283
Mean15285.28128
Median Absolute Deviation (MAD)1484
Skewness0.0166125065
Sum42386085
Variance2941748.399
MonotocityNot monotonic
2022-09-21T12:59:35.676697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
143351< 0.1%
 
136361< 0.1%
 
177441< 0.1%
 
177421< 0.1%
 
136441< 0.1%
 
177381< 0.1%
 
156891< 0.1%
 
177361< 0.1%
 
156871< 0.1%
 
177341< 0.1%
 
Other values (2763)276399.6%
 
ValueCountFrequency (%) 
123471< 0.1%
 
123481< 0.1%
 
123521< 0.1%
 
123561< 0.1%
 
123581< 0.1%
 
ValueCountFrequency (%) 
182871< 0.1%
 
182831< 0.1%
 
182821< 0.1%
 
182731< 0.1%
 
182721< 0.1%
 

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION

Distinct2756
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2845.044446
Minimum36.56
Maximum279138.02
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:35.812635image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum36.56
5-th percentile264.548
Q1628.78
median1169.94
Q32424.04
95-th percentile7490.982
Maximum279138.02
Range279101.46
Interquartile range (IQR)1795.26

Descriptive statistics

Standard deviation10466.82835
Coefficient of variation (CV)3.678968308
Kurtosis372.786099
Mean2845.044446
Median Absolute Deviation (MAD)687.93
Skewness17.09734698
Sum7889308.25
Variance109554495.8
MonotocityNot monotonic
2022-09-21T12:59:35.932551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
33120.1%
 
1314.4520.1%
 
734.9420.1%
 
1353.7420.1%
 
889.9320.1%
 
2053.0220.1%
 
1078.9620.1%
 
1025.4420.1%
 
918.220.1%
 
731.920.1%
 
Other values (2746)275399.3%
 
ValueCountFrequency (%) 
36.561< 0.1%
 
521< 0.1%
 
52.21< 0.1%
 
62.431< 0.1%
 
68.841< 0.1%
 
ValueCountFrequency (%) 
279138.021< 0.1%
 
259657.31< 0.1%
 
194550.791< 0.1%
 
140450.721< 0.1%
 
124564.531< 0.1%
 

recency_days
Real number (ℝ≥0)

ZEROS

Distinct252
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.64731338
Minimum0
Maximum372
Zeros33
Zeros (%)1.2%
Memory size21.7 KiB
2022-09-21T12:59:36.060477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q110
median29
Q373
95-th percentile211
Maximum372
Range372
Interquartile range (IQR)63

Descriptive statistics

Standard deviation68.42352582
Coefficient of variation (CV)1.207886513
Kurtosis3.430442793
Mean56.64731338
Median Absolute Deviation (MAD)23
Skewness1.898015296
Sum157083
Variance4681.778886
MonotocityNot monotonic
2022-09-21T12:59:36.185416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1993.6%
 
4873.1%
 
2853.1%
 
3853.1%
 
8762.7%
 
10672.4%
 
9662.4%
 
7652.3%
 
17622.2%
 
22552.0%
 
Other values (242)202673.1%
 
ValueCountFrequency (%) 
0331.2%
 
1993.6%
 
2853.1%
 
3853.1%
 
4873.1%
 
ValueCountFrequency (%) 
3721< 0.1%
 
3661< 0.1%
 
3601< 0.1%
 
35830.1%
 
3541< 0.1%
 

qty_invoices
Real number (ℝ≥0)

Distinct55
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.054814281
Minimum2
Maximum206
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:36.322327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q12
median4
Q36
95-th percentile17
Maximum206
Range204
Interquartile range (IQR)4

Descriptive statistics

Standard deviation9.072771138
Coefficient of variation (CV)1.498439212
Kurtosis183.9078799
Mean6.054814281
Median Absolute Deviation (MAD)2
Skewness10.62384214
Sum16790
Variance82.31517613
MonotocityNot monotonic
2022-09-21T12:59:36.451253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
277928.1%
 
349918.0%
 
439314.2%
 
52378.5%
 
61736.2%
 
71385.0%
 
8983.5%
 
9692.5%
 
10552.0%
 
11541.9%
 
Other values (45)27810.0%
 
ValueCountFrequency (%) 
277928.1%
 
349918.0%
 
439314.2%
 
52378.5%
 
61736.2%
 
ValueCountFrequency (%) 
2061< 0.1%
 
1991< 0.1%
 
1241< 0.1%
 
971< 0.1%
 
9120.1%
 

qty_items
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1638
Distinct (%)59.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1671.783988
Minimum2
Maximum196844
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:36.599178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile119.6
Q1330
median704
Q31478
95-th percentile4614
Maximum196844
Range196842
Interquartile range (IQR)1148

Descriptive statistics

Standard deviation5890.699162
Coefficient of variation (CV)3.523600658
Kurtosis485.4627718
Mean1671.783988
Median Absolute Deviation (MAD)452
Skewness18.17697463
Sum4635857
Variance34700336.61
MonotocityNot monotonic
2022-09-21T12:59:36.732129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
310110.4%
 
24680.3%
 
15080.3%
 
30070.3%
 
39470.3%
 
51670.3%
 
21970.3%
 
120070.3%
 
49370.3%
 
27270.3%
 
Other values (1628)269797.3%
 
ValueCountFrequency (%) 
21< 0.1%
 
161< 0.1%
 
171< 0.1%
 
191< 0.1%
 
201< 0.1%
 
ValueCountFrequency (%) 
1968441< 0.1%
 
802631< 0.1%
 
773731< 0.1%
 
699931< 0.1%
 
645491< 0.1%
 

qty_products
Real number (ℝ≥0)

Distinct467
Distinct (%)16.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean129.7890371
Minimum2
Maximum7838
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:36.872029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile10
Q134
median72
Q3143
95-th percentile400.2
Maximum7838
Range7836
Interquartile range (IQR)109

Descriptive statistics

Standard deviation277.8250768
Coefficient of variation (CV)2.140589706
Kurtosis336.7416641
Mean129.7890371
Median Absolute Deviation (MAD)45
Skewness15.34716552
Sum359905
Variance77186.7733
MonotocityNot monotonic
2022-09-21T12:59:36.997949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
28381.4%
 
35341.2%
 
26301.1%
 
27301.1%
 
29301.1%
 
25271.0%
 
15271.0%
 
31271.0%
 
19271.0%
 
33260.9%
 
Other values (457)247789.3%
 
ValueCountFrequency (%) 
2110.4%
 
3120.4%
 
4160.6%
 
5160.6%
 
6240.9%
 
ValueCountFrequency (%) 
78381< 0.1%
 
56731< 0.1%
 
50951< 0.1%
 
45801< 0.1%
 
26981< 0.1%
 

avg_ticket
Real number (ℝ≥0)

SKEWED

Distinct2771
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.10411414
Minimum2.150588235
Maximum4453.43
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:37.134871image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.150588235
5-th percentile4.852453809
Q112.41668
median17.94081081
Q325.02555556
95-th percentile87.75747368
Maximum4453.43
Range4451.279412
Interquartile range (IQR)12.60887556

Descriptive statistics

Standard deviation107.6316856
Coefficient of variation (CV)3.352582324
Kurtosis1054.619311
Mean32.10411414
Median Absolute Deviation (MAD)6.337289189
Skewness27.67812665
Sum89024.70852
Variance11584.57974
MonotocityNot monotonic
2022-09-21T12:59:37.254793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.16220.1%
 
14.4783333320.1%
 
17.569444441< 0.1%
 
11.504657531< 0.1%
 
18.575588241< 0.1%
 
21.526981131< 0.1%
 
8.5815322581< 0.1%
 
81.61< 0.1%
 
33.9251< 0.1%
 
54.591764711< 0.1%
 
Other values (2761)276199.6%
 
ValueCountFrequency (%) 
2.1505882351< 0.1%
 
2.43251< 0.1%
 
2.4623711341< 0.1%
 
2.5112413791< 0.1%
 
2.5153333331< 0.1%
 
ValueCountFrequency (%) 
4453.431< 0.1%
 
1687.21< 0.1%
 
952.98751< 0.1%
 
872.131< 0.1%
 
841.02144931< 0.1%
 

avg_recency_days
Real number (ℝ≥0)

Distinct1155
Distinct (%)41.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean78.74898658
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:37.377722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile13
Q134.125
median59
Q399
95-th percentile224
Maximum366
Range365
Interquartile range (IQR)64.875

Descriptive statistics

Standard deviation66.48880369
Coefficient of variation (CV)0.8443131343
Kurtosis3.688673291
Mean78.74898658
Median Absolute Deviation (MAD)30
Skewness1.830949046
Sum218370.9398
Variance4420.761016
MonotocityNot monotonic
2022-09-21T12:59:37.507648image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
70210.8%
 
46180.6%
 
55170.6%
 
91160.6%
 
49160.6%
 
31160.6%
 
35150.5%
 
42150.5%
 
21150.5%
 
14140.5%
 
Other values (1145)261094.1%
 
ValueCountFrequency (%) 
190.3%
 
240.1%
 
2.8615384621< 0.1%
 
360.2%
 
3.3303571431< 0.1%
 
ValueCountFrequency (%) 
3661< 0.1%
 
3651< 0.1%
 
3641< 0.1%
 
3631< 0.1%
 
35720.1%
 

frequency
Real number (ℝ≥0)

SKEWED

Distinct1225
Distinct (%)44.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04971312176
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:37.642570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.008746355685
Q10.01578947368
median0.0243902439
Q30.04166666667
95-th percentile0.1153846154
Maximum17
Range16.99455041
Interquartile range (IQR)0.02587719298

Descriptive statistics

Standard deviation0.3376551076
Coefficient of variation (CV)6.792072107
Kurtosis2295.704265
Mean0.04971312176
Median Absolute Deviation (MAD)0.01069161377
Skewness46.07732187
Sum137.8544866
Variance0.1140109717
MonotocityNot monotonic
2022-09-21T12:59:37.766499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.0625180.6%
 
0.02777777778170.6%
 
0.02380952381160.6%
 
0.09090909091150.5%
 
0.08333333333150.5%
 
0.02941176471140.5%
 
0.03448275862140.5%
 
0.02564102564130.5%
 
0.01923076923130.5%
 
0.03571428571130.5%
 
Other values (1215)262594.7%
 
ValueCountFrequency (%) 
0.0054495912811< 0.1%
 
0.0054644808741< 0.1%
 
0.0054794520551< 0.1%
 
0.0054945054951< 0.1%
 
0.00558659217920.1%
 
ValueCountFrequency (%) 
171< 0.1%
 
31< 0.1%
 
21< 0.1%
 
1.1428571431< 0.1%
 
180.3%
 

qtd_returns
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct204
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.97367472
Minimum0
Maximum9014
Zeros1481
Zeros (%)53.4%
Memory size21.7 KiB
2022-09-21T12:59:37.899438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q39
95-th percentile96.8
Maximum9014
Range9014
Interquartile range (IQR)9

Descriptive statistics

Standard deviation290.7142899
Coefficient of variation (CV)8.312374729
Kurtosis571.7456843
Mean34.97367472
Median Absolute Deviation (MAD)0
Skewness21.6260127
Sum96982
Variance84514.79837
MonotocityNot monotonic
2022-09-21T12:59:38.020354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0148153.4%
 
11294.7%
 
21174.2%
 
3823.0%
 
4722.6%
 
6632.3%
 
5552.0%
 
12451.6%
 
8391.4%
 
9381.4%
 
Other values (194)65223.5%
 
ValueCountFrequency (%) 
0148153.4%
 
11294.7%
 
21174.2%
 
3823.0%
 
4722.6%
 
ValueCountFrequency (%) 
90141< 0.1%
 
80041< 0.1%
 
44271< 0.1%
 
37681< 0.1%
 
33321< 0.1%
 

avg_basket_size
Real number (ℝ≥0)

Distinct1937
Distinct (%)69.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean231.446111
Minimum1
Maximum6009.333333
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:38.145282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45
Q1103.3333333
median172
Q3278.2
95-th percentile585.7
Maximum6009.333333
Range6008.333333
Interquartile range (IQR)174.8666667

Descriptive statistics

Standard deviation261.7394008
Coefficient of variation (CV)1.130887012
Kurtosis115.4829522
Mean231.446111
Median Absolute Deviation (MAD)81
Skewness7.715252816
Sum641800.0657
Variance68507.51392
MonotocityNot monotonic
2022-09-21T12:59:38.265213image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100110.4%
 
8690.3%
 
6080.3%
 
7580.3%
 
13670.3%
 
19770.3%
 
10570.3%
 
20870.3%
 
7370.3%
 
8270.3%
 
Other values (1927)269597.2%
 
ValueCountFrequency (%) 
11< 0.1%
 
3.3333333331< 0.1%
 
5.3333333331< 0.1%
 
5.6666666671< 0.1%
 
6.1428571431< 0.1%
 
ValueCountFrequency (%) 
6009.3333331< 0.1%
 
3868.651< 0.1%
 
28801< 0.1%
 
2733.9444441< 0.1%
 
2518.7692311< 0.1%
 

avg_unique_basket_size
Real number (ℝ≥0)

Distinct897
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.14109332
Minimum0.2
Maximum177
Zeros0
Zeros (%)0.0%
Memory size21.7 KiB
2022-09-21T12:59:38.396138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile2
Q17.545454545
median13.5
Q322
95-th percentile45.1
Maximum177
Range176.8
Interquartile range (IQR)14.45454545

Descriptive statistics

Standard deviation14.26277434
Coefficient of variation (CV)0.832080782
Kurtosis10.00830576
Mean17.14109332
Median Absolute Deviation (MAD)6.666666667
Skewness2.24639468
Sum47532.25179
Variance203.4267318
MonotocityNot monotonic
2022-09-21T12:59:38.527073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
8341.2%
 
13331.2%
 
7321.2%
 
9321.2%
 
16321.2%
 
12301.1%
 
18.5291.0%
 
14291.0%
 
17291.0%
 
15291.0%
 
Other values (887)246488.9%
 
ValueCountFrequency (%) 
0.21< 0.1%
 
0.2530.1%
 
0.333333333360.2%
 
0.41< 0.1%
 
0.40909090911< 0.1%
 
ValueCountFrequency (%) 
1771< 0.1%
 
1051< 0.1%
 
1041< 0.1%
 
981< 0.1%
 
95.51< 0.1%
 

Interactions

2022-09-21T12:59:17.188900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:17.338814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:17.458745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:17.588670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:17.715598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:17.859515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.004432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.112369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.273277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.387212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.496166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.612083image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.860942image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:18.973886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.083822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.195758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.310698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.428625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.550545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.657493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.773427image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.885353image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:19.993291image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.107236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.220163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.334096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.449040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.567977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.688902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.811831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:20.933752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.049695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.169627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.289558image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.399485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.517417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.639347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.758279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.879210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:21.999151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.122072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.248008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.373927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.487877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.610791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.733730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.847655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:22.967596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.237431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.360371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.479302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.604221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.730158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.859085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:23.985013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.100946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.222876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.346805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.462744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.591656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.722580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.850516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:24.971437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.094367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.218306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.345223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.470151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.585095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.718010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.839955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:25.954883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.074805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.199744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.302674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.405625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.514552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.622501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.736435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.847362image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:26.945306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.052254image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.158193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.257139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.362074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.470020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.585938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.704870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.822803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:27.941734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.063666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.187603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.297548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.604364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.741285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.856210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:28.974158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.094074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.209018image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.329940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.455868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.581794image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.703749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.824655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:29.936591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.053539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.176453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.287399image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.405332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.524264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.628194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.736132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.843081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:30.954008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.067953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.175880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.279821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.386770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.492699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.596639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.702579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.810526image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:31.924461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.035388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.149336image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.271252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.391184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.511116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.625049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.742982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.859925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:32.967853image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.080798image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.196722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.313664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.428589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.548530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.672449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.798377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:33.922315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:34.039239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:34.159180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:34.280101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:34.391047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:34.508979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-09-21T12:59:38.652001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-21T12:59:38.857533image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-21T12:59:39.058434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-21T12:59:39.257305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-21T12:59:34.978710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-21T12:59:35.276499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketavg_recency_daysfrequencyqtd_returnsavg_basket_sizeavg_unique_basket_size
0178505391.21372.034.01733.0297.018.1522221.00000017.00000040.050.9705880.617647
1130473232.5956.09.01390.0171.018.90403552.8333330.02830235.0154.44444411.666667
2125836705.382.015.05028.0232.028.90250026.5000000.04032350.0335.2000007.600000
313748948.2595.05.0439.028.033.86607192.6666670.0179210.087.8000004.800000
415100876.00333.03.080.03.0292.00000020.0000000.07317122.026.6666670.333333
5152914623.3025.014.02102.0102.045.32647126.7692310.04011529.0150.1428574.357143
6146885630.877.021.03621.0327.017.21978619.2631580.057221399.0172.4285717.047619
7178095411.9116.012.02057.061.088.71983639.6666670.03352041.0171.4166673.833333
81531160767.900.091.038194.02379.025.5434644.1910110.243316474.0419.7142866.230769
9160982005.6387.07.0613.067.029.93477647.6666670.0243900.087.5714294.857143

Last rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketavg_recency_daysfrequencyqtd_returnsavg_basket_sizeavg_unique_basket_size
276317290525.243.02.0404.0102.05.14941213.00.1428570.0202.00000046.000000
27641478577.4010.02.084.03.025.8000005.00.3333330.042.0000001.000000
276517254272.444.02.0252.0112.02.43250011.00.1666670.0126.00000050.000000
276617232421.522.02.0203.036.011.70888912.00.1538460.0101.50000015.000000
276717468137.0010.02.0116.05.027.4000004.00.4000000.058.0000002.500000
276813596697.045.02.0406.0166.04.1990367.00.2500000.0203.00000066.500000
2769148931237.859.02.0799.073.016.9568492.00.6666670.0399.50000036.000000
277014126706.137.03.0508.015.047.0753333.00.75000050.0169.3333334.666667
2771135211092.391.03.0733.0435.02.5112414.50.3000000.0244.333333104.000000
277215060301.848.04.0262.0120.02.5153331.02.0000000.065.50000020.000000